NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pretraining and the lasso

https://doi.org/10.1093/jrsssb/qkaf050

Craig, Erin; Pilanci, Mert; Le_Menestrel, Thomas; Narasimhan, Balasubramanian; Rivas, Manuel_A; Gullaksen, Stein-Erik; Dehghannasiri, Roozbeh; Salzman, Julia; Taylor, Jonathan; Tibshirani, Robert (August 2025, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract Pre-training is a powerful paradigm in machine learning to pass information across models. For example, suppose one has a modest-sized dataset of images of cats and dogs and plans to fit a deep neural network to classify them. With pre-training, we start with a neural network trained on a large corpus of images of not just cats and dogs but hundreds of classes. We fix all network weights except the top layer(s) and fine tune on our dataset. This often results in dramatically better performance than training solely on our dataset. Here, we ask: ‘Can pre-training help the lasso?’. We propose a framework where the lasso is fit on a large dataset and then fine-tuned on a smaller dataset. The latter can be a subset of the original, or have a different but related outcome. This framework has a wide variety of applications, including stratified and multi-response models. In the stratified model setting, lasso pre-training first estimates coefficients common to all groups, then estimates group-specific coefficients during fine-tuning. Under appropriate assumptions, support recovery of the common coefficients is superior to the usual lasso trained on individual groups. This separate identification of common and individual coefficients also aids scientific understanding.
more » « less
Spectral Adapter: Fine-Tuning in Spectral Space

Fangzhao, Zhang; Pilanci, Mert (December 2024, Neural Information Processing Systems (Neurips) 2024)

Full Text Available
CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks

Feng, Miria; Frangella, Zachary; Pilanci, Mert (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)

We introduce the CRONOS algorithm for convex optimization of two-layer neural networks. CRONOS is the first algorithm capable of scaling to high-dimensional datasets such as ImageNet, which are ubiquitous in modern deep learning. This significantly improves upon prior work, which has been restricted to downsampled versions of MNIST and CIFAR-10. Taking CRONOS as a primitive, we then develop a new algorithm called CRONOS-AM, which combines CRONOS with alternating minimization, to obtain an algorithm capable of training multilayer networks with arbitrary architectures. Our theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions. In addition, we validate the efficacy of CRONOS and CRONOS-AM through extensive large-scale numerical experiments with GPU acceleration in JAX. Our results show that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb. To the best of our knowledge, CRONOS is the first algorithm which utilizes the convex reformulation to enhance performance on large-scale learning tasks
more » « less
Full Text Available
Compressing Large Language Models using Low Rank and Low Precision Decomposition

Saha, Rajarshi; Sagan, Naomi; Srivastava, Naomi; Goldsmith, Andrea; Pilanci, Mert (December 2024, Neural Information Processing Systems (Neurips) 2024)

Full Text Available
Gradient Coding with Iterative Block Leverage Score Sampling

Charalambides, Neophytos; Pilanci, Mert; Hero, Alfred (August 2024, IEEE transactions on information theory)

Full Text Available
Gradient Coding With Iterative Block Leverage Score Sampling

https://doi.org/10.1109/TIT.2024.3420222

Charalambides, Neophytos; Pilanci, Mert; Hero, Alfred O (September 2024, IEEE Transactions on Information Theory)

Gradient coding is a method for mitigating straggling servers in a centralized computing network that uses erasure-coding techniques to distributively carry out first-order optimization methods. Randomized numerical linear algebra uses randomization to develop improved algorithms for large-scale linear algebra computations. In this paper, we propose a method for distributed optimization that combines gradient coding and randomized numerical linear algebra. The proposed method uses a randomized ℓ2 -subspace embedding and a gradient coding technique to distribute blocks of data to the computational nodes of a centralized network, and at each iteration the central server only requires a small number of computations to obtain the steepest descent update. The novelty of our approach is that the data is replicated according to importance scores, called block leverage scores, in contrast to most gradient coding approaches that uniformly replicate the data blocks. Furthermore, we do not require a decoding step at each iteration, avoiding a bottleneck in previous gradient coding schemes. We show that our approach results in a valid ℓ2 -subspace embedding, and that our resulting approximation converges to the optimal solution.
more » « less
Full Text Available
Randomized Geometric Algebra Methods for Convex Neural Networks

Wang, Yifei; Kim, Sungyoon; Chu, Paul; Subramaniam, Indu; Pilanci, Mert (June 2024, arXiv preprint)

Full Text Available
Path regularization: A convexity and sparsity inducing regularization for parallel relu networks

Ergen, Tolga; Pilanci, Mert (December 2023, Neural Information Processing Systems (NeurIPS))

Full Text Available
From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford Algebra and Convexity

Pilanci, Mert (October 2023, Transactions on machine learning research)

In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via ℓ1 regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.
more » « less
Full Text Available
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

Dwaraknath, Rajat Vadiraj; Ergen, Tolga; Pilanci, Mert (December 2023, Neural Information Processing Systems (NeurIPS))

Full Text Available

« Prev Next »

Search for: All records